Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Human Action Recognition in Videos with Local Representation

Participants : Michal Koperski, François Brémond.

keywords: Computer Vision, Action Recognition, Machine Learning, Deep Learning, Artificial Intelligence

This work targets recognition of human actions in videos. Action recognition can be defined as the ability to determine whether a given action occurs in the video. This problem is complicated due to the high complexity of human actions such as appearance variation, motion pattern variation, occlusions, etc.

Recent advancements in either hand-crafted or deep-learning methods significantly improved action recognition accuracy. But there are many open questions, which keep action recognition task far from being solved.

Current state-of-the-art methods achieved satisfactory results mostly based on features, which focus on a local spatio-temporal neighborhood. But human actions are complex, thus the following question that should be answered is how to model a relationship between local features, especially in spatio-temporal context.

In previous years, we proposed 2 methods which try to answer that challenging problem. In the first method  [49], we proposed to measure a pairwise relationship between features with Brownian Covariance. In the second method  [83], we proposed to model spatial-layout of features with respect to person bounding box, achieving better or similar results as skeleton based methods. Our methods are generic and can improve both hand-crafted and deep-learning based methods.

Another open question is whether 3D information can improve action recognition. Currently, most of the state-of-the-art methods work on RGB data, which is missing 3D information. In addition, many methods use 3D information only to obtain body joints, which is still challenging to obtain. In our previous work, we showed that 3D information can be used not only for joints detection. We proposed  [82] a novel descriptor which introduces 3D trajectories computed on RGB-D information.

In this year work we provide comprehensive study of methods proposed in the previous years, which is a part of PhD thesis [22] defended on 9th November 2017. In the evaluation part, we focus particularly on daily living actions – performed by people in their daily self-care routine. In the scope of our interest are actions like eating, drinking, cooking. Recognition of such actions is particularly important for patient monitoring systems in hospitals and nursing homes. Daily living action recognition is also a key component of assistive robots.

To evaluate the methods proposed in this work we created a large-scale dataset, which consists of 160 hours of video footage of 20 senior people. The videos were recorded in 3 different rooms by 7 RGB-D sensors. We have annotated the videos with 28 action classes. The actions in the dataset are performed in un-acted and unsupervised way, thus the dataset introduces real-world challenges, absent in many public datasets.

We proposed also new GHOG descriptor which is able to capture rough static pose information from person bounding box without need of skeleton detection. In our PhD thesis we show that fusion of GHOG with descriptors, which capture dynamic information (eg. [49], [83], [82]) leads to significant recognition accuracy improvement.

Finally, we claim that ability to process a video in real-time will be a key factor in future action recognition applications. All methods proposed in this work are ready to work in real-time. We proved our claim empirically by building a real-time action detection system, which was successfully adapted by Toyota company in their robotic systems.

We have also evaluated our methods on our Smarthomes dataset as well as on publicly available datasets: CAD60, CAD120 and MSRDailyActivity3D. Our experiments show that the methods proposed in this thesis improve state-of-the-art results.

More detail description can be found in the PhD thesis [22].